Advanced B.2 - Encoding Good Habits: CLAUDE.md, Pre-registration, and Skills

🎯 What We'll Cover

A folder structure is a skeleton. This page adds the muscle: the standing instructions that make the agent keep the structure honest, the discipline of deciding what counts before you spend the compute, and the reusable Skills that turn a good workflow into one you can repeat across projects.

The centrepiece is the CLAUDE.md research-habits template — the single most useful thing this track produces. It is where the integrity habits the whole course has taught stop being good intentions and become rules the agent follows every time.

📋 The CLAUDE.md Research-Habits Template

Lesson A.3 introduced CLAUDE.md as the control surface — the file Claude Code reads at the start of every session. Here it does its real job: encoding research integrity as machine-readable rules. This template is adapted, with thanks, from the AGENTS.md conventions in Dominik Lukeš's Using AI Agents for Reproducible Research workshop (Oxford), and extended with the harder-edged habits that working researchers tend to add once they have been burned a few times.

# CLAUDE.md — <project name>

## What this project is
- One line on the project, the data, and the question.
- Status: e.g. “fictional teaching project” / “real, unpublished — do not share”.

## Working rules
- Read files before proposing changes.
- Never modify anything in data/raw/. It is the only copy.
- Use plan mode and show me the plan before any consequential change.
- When you summarise or analyse, name which files you used.
- If something is uncertain, say so — never fill a gap with a plausible guess.
- Save reusable code in scripts/; save generated outputs in outputs/.
- Log every consequential decision in notes/decision-log.md, dated, with the reason.
- Run analysis scripts to a committed log (... 2>&1 | tee outputs/<name>_run.log);
  the log, not the figure, is the record of what happened.
- Work from logs and printed tables, not screenshots, unless I ask otherwise.
- Before you tell me a task is done, show me the diff.

## Pre-registration
- If the pre-registrations/ folder has an entry for the current question,
  it is binding. Do not tune on the headline metric; do not reframe a
  result after the fact. If reality contradicts the prediction, apply the
  decision rule and say so plainly.

## Boundaries
- Don't add claims about real people, studies, or institutions that aren't in the sources.
- Don't commit or push unless I ask; don't send anything on my behalf.
- British spelling.

💡 Why this is the hinge of the whole track

Read those rules again and notice what they are. Almost every line is a research-integrity habit this course already taught — now written somewhere the agent obeys it every time:

“Never fill a gap with a plausible guess” is the Week 9 hallucination lesson.
“Name which files you used” is the Week 5 citation-integrity lesson.
“Log every consequential decision” is reproducibility itself.
“Show me the diff before done” is verification (Weeks 9–10).
“Don't reframe a result after the fact” is the calibrated-reading disposition the whole course is built on.

The disposition the course spent eleven weeks building collapses, here, into one file you can drop into any project.

📥 Download: the CLAUDE.md research-habits template

The template above as a ready-to-edit file. Drop it into the root of a project, fill in the top two lines, and adjust the rules to fit. CLAUDE-md-research-template.md

📝 Pre-registration: Decide What Counts Before You Spend the Compute

This is the habit that most separates careful computational research from the other kind, and it goes beyond anything in the source workshop. The idea is simple and old — it comes from clinical trials, and the case for making it a routine research habit rather than a clinical-trials formality is laid out in Nosek et al.'s The Preregistration Revolution (2018, PNAS) — and it transfers directly to any experiment where you can run a cheap pilot: write down what you expect to find, and the rule for deciding, before you run the thing.

Concretely, a pre-registration is a short file, committed to pre-registrations/ before any serious compute, that states: the question; what an interesting, a boring-but-worth-knowing, and a dead result would each look like, in advance; the analysis you will run; and the decision rule you will follow. The point is to bind your future self. When the result comes back and it is disappointing, the temptation is to quietly re-frame — to tune the metric, move the threshold, tell a flattering story. A committed pre-registration is what stops you, because the standard was set when you had nothing to defend.

# pre-registration: do microplastic counts differ across the three sites?
# committed 2026-03-20, before running the analysis

Question: Is the median microplastic count higher at the town site than
  at the upstream site?

Prediction: Town > upstream, by a margin large enough to matter ecologically
  (we set “matters” at a difference of ≥ 20 particles/L in advance).

Decision rule:
  - Interesting: town median exceeds upstream by ≥ 20 particles/L AND the
    difference survives a Mann-Whitney U test at p < 0.05.
  - Boring-but-worth-knowing: a real but smaller difference, or no difference.
    Either way we report it straight.
  - Dead: the data are too sparse or too messy after cleaning to test at all.

Fixed in advance: outlier rule (exclude only samples the field notes mark
  unreliable); missing-value handling (listwise); the test (Mann-Whitney U,
  because counts are not normally distributed). No tuning of these on results.

👤 From the instructor's own practice

“It’s very easy now to come up with ideas and to explore them. The important point is to figure out in advance what are going to be the markers for success as you go along, so writing them down before you start is vital.”

“Of course sometimes you get a completely surprising result, but mostly you define a question and come up with what would correspond to an interesting outcome at each stage (starting off small) — and if it fails the test as you go along, that’s generally the time to stop.”

“This means you have to define interesting/important carefully upfront, but once you’ve defined it, unless there’s something really strange that happens, stick to it and save the compute. This is the pre-registration process: you register the gates at the start, and if the gates are closed as you go along, you stop.”

That staged framing is what makes pre-registration practical rather than bureaucratic. You rarely commit one giant analysis up front and judge it only at the end; instead you set gates at each stage and run the cheapest stage first. A small pilot either clears its pre-registered gate or it does not. If it clears, you have earned the right to spend more compute on the next stage; if it does not, you stop — and you have spent almost nothing finding out. Pre-registration and cheap pilots belong together: the gates tell you, in advance and honestly, how far a question deserves to be taken before the next increment of effort. The discipline is not “decide everything before you start”; it is “decide the gate before each step, and let a closed gate stop you.”

Pre-registration is the operational form of the course's deepest theme: ask “is this interesting?” not just “is this publishable?”, and be honest when the data say no. It is also where the agent earns its keep without being given a chance to flatter you: the CLAUDE.md rule above tells Claude Code that an existing pre-registration is binding, so the agent that runs your analysis is the same one holding you to the standard you set before you saw the answer.

🧩 Skills and Subagents: Packaging Reproducible Practice

A CLAUDE.md encodes the rules of one project. A Skill packages a reusable workflow that travels across projects. Where a rule says “always log decisions,” a Skill says “here is exactly how to build a data inventory, every time” — a tested procedure rather than a re-improvised prompt.

A Skill is a folder under .claude/skills/ with a SKILL.md file: a little YAML header naming it and saying when to use it, then concise instructions. For example, a reproducible-research skill for creating a data inventory:

# .claude/skills/data-inventory/SKILL.md
---
name: data-inventory
description: >
  Create docs/data-inventory.md describing each raw data file, its
  variables, units, row counts, and visible problems. Use when a project
  has data in data/raw/ and no inventory yet.
---

Inspect every file in data/raw/ (read-only). For each file, record:
the filename, what it appears to contain, the columns and their units,
the number of rows, missing-value counts, and any obvious anomalies
(impossible values, inconsistent column names, mixed date formats).
Do not edit the raw files. Where something is ambiguous, say so —
do not guess. Cross-check against any field notes and flag contradictions.
Write the result to docs/data-inventory.md.

Written once and inspected, that Skill produces the same disciplined inventory on every project you use it in. That repeatability is the reproducibility win: a Skill is a procedure you have tested and can trust, the opposite of ad-hoc prompting that varies every time.

📦 Download: seven research Skills

The data-inventory Skill above is deliberately small, to show the shape. To see the idea at full size — and to actually use it — here is a bundle of seven ready-to-install Skills for reproducible research: pre-register (idea triage before you spend compute), arxiv-fetcher and academic-pdf-to-mkd (getting papers into a workable form), paper-review, claim-fact-checker and reference-verifier (the Week 9 verification habit, packaged as procedures), and convergence-check (an honest read on whether an iterative loop is converging or just spinning). Unzip, drop the folders into ~/.claude/skills/, and Claude Code loads each one on demand. co-scientist-skills.zip

Three of the seven — academic-pdf-to-mkd, claim-fact-checker, and paper-review — are adapted, with thanks, from Dominik Lukeš’s MIT-licensed agent-skills collection (github.com/techczech/dominiks-agent-skills); the bundle ships with a LICENSE and full attribution. The other four are the instructor’s own.

Subagents add a second move. For a bounded job, Claude Code can spawn a separate, focused agent — and the most useful research application is verification. After the main agent has produced an analysis, you can have an independent subagent check it: re-derive the result from the raw data, look for the errors the first agent might have made, and report disagreements. This is the Week 9 adversarial-verification idea made operational — a second pair of eyes that does not share the first agent's assumptions.

🔎 A note on trust

A verification subagent is genuinely useful, but it is not magic: it is another instance of the same kind of system, and it can share the same blind spots. Treat it as one more screen, not as a guarantee — exactly as Week 9 said of any single check. The human who reads the disagreements and decides what they mean is still you.

Coming up in B.3: we put all of it together — a full analysis of the Berg River data from messy files to a result a stranger could reproduce, then how to verify agentic work honestly, and why the reproducible folder turns out to be the best research-disclosure you can offer.